library(tidyverse)
library(dplyr)
library(janitor)
library(here)
library(wordcloud)
library(RColorBrewer)
library(patchwork)
library(ggwordcloud)
library(paletteer)

# figure out how to hide the text output and keep the code, ideally allowing the reader to open and close the code in the knitted document through some variation of: include = FALSE, collapse = TRUE, class.source = 'fold-hide', results = 'hide'

Hi there! This post is coming to you from Juliet Cohen and Scout Leonard! We are both students at UCSB’s Bren School of Environmental Science & Management in the inaugural cohort of the Masters of Environmental Data Science (MEDS) Program.

Scout is interested in growing as an environmental data scientist after having worked with large datasets through food system and food security work in Oakland, California. Already, she is so pleased with the emphasis her MEDS courses have placed on responsible data science, as she hopes to use her MEDS toolkit to influence polices and programs which build sustainable, equitable food systems. You can learn more about her previous and ongoing work at her website.

Juliet is interested in applying environmental data science to wildlife biology and the interaction between humans and endangered species’ natural habitats. Juliet was inspired by her experience serving as a field technician in California and Hawaii. Throughout the MEDS summer curriculum, Juliet strengthened her collaborative programming skills and looks forward to learning about spatial analysis and modelling large, dynamic data sets. She hopes to contribute to open-source projects in the future.

For the first 6 weeks of our degree, we embarked on a whirlwind of introductory data science. From 9 AM to 5 PM, we sat in the newly constructed MEDS classroom at the National Center for Ecological Analysis and Synthesis (NCEAS) in downtown Santa Barbara, learning the basics we needed to jump start our data science degrees. These 1-2 week classes consisted not only of lectures, but of coding labs, collaborative team science projects, and individual and group presentations. We also had “flex sessions” for non-course content, like panels from various data scientists at NCEAS and representatives from local groups of R users, such as Santa Barbara R Ladies and Eco Data Science. Our summer term laid the foundation not only for the codes we’ll write this year, but also for how we create workflows and collaborate as growing scientists. Our course load included:

  • EDS 212: Essential Math in Environmental Data Science
  • EDS 221: Scientific Programming Essentials
  • EDS 214: Analytical Workflows and Scientific Reproducibility
  • EDS 215: Introduction to Data Storage and Management
  • EDS 216: Meta-Analysis and Systematic Reviews

After this intense MEDS summer, Juliet and Scout took some time to relax before fall quarter, but we also took this blog-post-writing opportunity to showcase some of what our class has learned so far.

As we reflected on the first quarter of our degree, we decided it may be more interesting to show, rather than tell, some of the skills we’ve learned. We also thought it might be fun to share reflections from our whole cohort to truly represent the student experience of this fast-paced, learning-filled summer.

What did MEDS students think about summer courses?

Survey Development

As such, we developed a survey (in Google forms) to send to our classmates and gather data about their perspectives on these first six weeks. We wondered about two main questions: 1.) what did the cohort think of our classes? and 2.) what kinds of fun things has the group been up to in our delightful home of Santa Barbara?

We developed a Google form for our peers to give feedback about these wondering. Our questions were developed with data tidying and visualization in mind, and when we got some unexpected answers, we developed perspective about how our survey could have been better to help us with a bit less wrangling. MEDS students, however, do not shy away from problematic data, so we persisted with the problems that arose :)

Data Visualizations

The following is a description of how we visualizes the most interesting survey data submitted by our peers, including neat graphics describing our MEDS summer, from tidying data to surfing after class. We hope you find it insightful, but also fun :)

Our Data

First, we read the data from our Google form survey, which we were able to download as a .csv file, into the R project we made for this blog project.

data <- read_csv(here("data", "MEDS Summer Reflection Survey (Responses) - Responses Clean.csv"))

Data Tidying

Next, we renamed the columns. Google forms does this frustrating, but understandable thing where the names of the columns of data are the questions we asked participants. This makes for super long names that are not fun to write code with. We instead named the columns of interest with the corresponding order of our coursework, i.e. column 1 is our first course, EDS 212.

#colnames(data) renames the columns
data_clean <- data %>% 
  rename("1" =  "Write 3 words to describe or represent week 1 (EDS212 w/ Allison) here:") %>% 
  rename("2_3" =  "Write 3 words to describe or represent weeks 2 & 3 (EDS221 w/ Allison) here:") %>% 
  rename("4" =  "Write 3 words to describe or represent week 4 (EDS214 w/ Julien) here:") %>% 
  rename("5" =  "Write 3 words to describe or represent week 5 (EDS215 w/ Frew) here:") %>% 
  rename("6" =  "Write 3 words to describe or represent week 1 (EDS216 w/ Scott) here:")

Data wrangling

The Google form format included five questions (one for each summer course) where students wrote in three words to describe how they felt about the course. In the .csv of the survey data, the three words submitted by a participant were grouped together in one cell per course.

To visualize how often certain descriptors for each summer course appear, we first needed to separate the three terms submitted for each course into separate observations. We executed this using the separate_rows() function. This expanded the terms into three separate observations per student in each class column.

# separate the columns into rows by parsing the 3 words in each observation into 3 different observations
# select certain cols because our first data viz is only using certain cols 
data_clean_1 <- data_clean %>% 
  separate_rows("1") %>% 
  select("Email Address","1")

data_clean_2_3 <- data_clean %>% 
  separate_rows("2_3")%>% 
  select("Email Address","2_3")

data_clean_4 <- data_clean %>% 
  separate_rows("4") %>% 
  select("Email Address","4")

data_clean_5 <- data_clean %>% 
  separate_rows("5") %>% 
  select("Email Address","5")

data_clean_6 <- data_clean %>% 
  separate_rows("6") %>% 
  select("Email Address","6")

Then, to observe the frequency of words used to describe each course, we used the table function. This creates a table that has two columns: one for each distint descriptive word students used and the frequency with which the words occur for the class. Then, we converted that table to a data frame.

# use the table() function to take counts of each "factor" (words) and use the data.frame() function to convert these tables to data frames

course_1 <- data.frame(table(data_clean_1$"1"))

course_1_df <- as.data.frame.matrix(course_1)

course_2_3 <- data.frame(table(data_clean_2_3$"2_3"))

course_2_3_df <- as.data.frame.matrix(course_2_3)

course_4 <- data.frame(table(data_clean_4$"4"))

course_4_df <- as.data.frame.matrix(course_4)

course_5 <- data.frame(table(data_clean_5$"5"))

course_5_df <- as.data.frame.matrix(course_5) %>% 
  filter(!Var1 == "tangent") %>% 
  filter(!Var1 == "tangents") %>% 
  filter(!Var1 == "dry")

course_6 <- data.frame(table(data_clean_6$"6"))

course_6_df <- as.data.frame.matrix(course_6)

Data Visualizing

After this, we wanted to visualize the frequency of words for each class. We used ggplot to create word clouds which represented the frequency of class descriptors. The word clouds display the descriptive words submitted by our classmates in sizes proportional to the frequency the words were used. The ggplot plot we used is called ggwordcloud. We updated the colors by adding an aesthetic feature where each class descriptor is represented by a different color. We also updated the word size so that the cloud was easier to read.

Since we used ggplot quite a bit in EDS 221, we opted for this method for generating word clouds over another package called wordcloud, which is specifically for word clouds. We found that we wanted to showcase our visualization skills by stacking visualizations and adding titles and colors, and it was easier to do this in a ggplot version of word clouds, since we had so much practice with other plots in ggplot.

The code chunk below is where we tried the wordcloud package. It made the correct visualizations, but we found it more difficult to make them as nice as some of the ggplots we made in class. It was a relief to learn that there is a word cloud feature in ggplot!

And finally, the ggplot word clouds!

# for each cloud, specify the background color within the theme to match the background color of the blog

cloud_1 <- ggplot(course_1_df, aes(label = Var1, size = Freq, color = Var1)) +
  geom_text_wordcloud() +
  scale_size_area(max_size = 20) +
  theme(plot.title = element_text(size = 25),
        panel.background = element_rect(fill = "white")) +
  labs(title = "Week 1 EDS 212: Essential Math in Environmental Data Science")

cloud_1
cloud_2_3 <- ggplot(course_2_3_df, aes(label = Var1, size = Freq, color = Var1)) +
  geom_text_wordcloud() +
  scale_size_area(max_size = 17) +
  theme(plot.title = element_text(size = 25),
        panel.background = element_rect(fill = "white")) +
  labs(title = "Weeks 2 & 3 EDS 221: Scientific Programming Essentials")

cloud_2_3
cloud_4 <- ggplot(course_4_df, aes(label = Var1, size = Freq, color = Var1)) +
  geom_text_wordcloud() +
  scale_size_area(max_size = 20) +
  theme(plot.title = element_text(size = 25),
        panel.background = element_rect(fill = "white")) +
  labs(title = "Week 4 EDS 214: Analytical Workflows and Scientific Reproducibility")

cloud_4
cloud_5 <- ggplot(course_5_df, aes(label = Var1, size = Freq, color = Var1)) +
  geom_text_wordcloud() +
  scale_size_area(max_size = 20) +
  theme(plot.title = element_text(size = 25),
        panel.background = element_rect(fill = "white")) +
  labs(title = "Week 5 EDS 215: Introduction to Data Storage and Management")

cloud_5
cloud_6 <- ggplot(course_6_df, aes(label = Var1, size = Freq, color = Var1)) +
  geom_text_wordcloud() +
  scale_size_area(max_size = 20) +
  theme(plot.title = element_text(size = 25),
        panel.background = element_rect(fill = "white")) +
  labs(title = "Week 6 EDS 216: Meta-Analysis and Systematic Reviews")

cloud_6
# use patchwork to stack the graphs

 cloud_1 / cloud_2_3 / cloud_4 / cloud_5 / cloud_6

What do MEDS students do outside of class?

# download cvs for SB activities
data_activities <- read_csv(here("data", "sb_activities_data.csv"))

# make data.frame for SB activities histogram

activities_clean <- data.frame(table(data_activities$"sb_activities"))

SB_activities <- ggplot(activities_clean, aes(y = reorder(Var1, +Freq), x = Freq)) +
  geom_histogram(stat = 'identity', aes(fill = Var1, color = "blue")) +
  scale_fill_paletteer_d("dutchmasters::milkmaid") +
  theme(legend.position = "none",
        panel.grid = element_blank(),
        panel.background = element_rect(fill = "white")) +
  labs(title = "MEDS Favorite Santa Barbara Activities",
       y = "Activity",
       x = "Total Votes")

SB_activities

Histogram representing the MEDS students' favorite activities in Santa Barbara. The most popular activities are surfing, going to the beach, and biking.

What are the MEDS students excited for in Fall quarter?

Finally, here are just a few of the responses we got to our survey question the aspects of fall quarter MEDS students are looking forward to.

“Expanding my coding fundamentals, Tidy Tuesday’s, building a portfolio and updating my website, and working collaboratively on the capstone project.”

“I’m excited to build on the foundation we created this summer!”

“Excited to work with some spatial data in Frew’s next class!”

“I’m looking forward to learning more skills in data science and be able to apply them to assignments and projects in our classes. I’m also looking forward to learning more about potential future careers in environmental data science.”

Meet the MEDS cohort!

Below you’ll find photos of our survey participants, the MEDS 2022 cohort!

Thanks to you all for your help on this blog post, and for being wonderful collaborators in everything this summer, from giving project presentations, to just checking parentheses.

The MEDS 2022 cohort gathered at NCEAS downtown after class during summer session. The MEDS 2022 cohort gathered at NCEAS after class during summer session.

Members of the MEDS cohort in downtown Santa Barbara celebrating completing the first half of summer session classes with faculty and their pets. Members of the MEDS cohort in downtown Santa Barbara celebrating completing the first half of summer session classes with faculty and their pets.